[III. Mixing two disributions][]
IV. Advanced problem: Occupancy of Swiss breeding birds
Models that “mix” parameters from more than one distribution provide virtually infinite flexibility in describing how data arise from socio-ecological processes. Such distributions, of course, can be bi- or multimodal, but they can also be used to create distributions that are more “flat” than can be accommodated by the usual parametric toolbox. This exercise provides familiarity with three applications of mixture models:inflating zeros, combining two parametric distributions, and, for the speedy and ambitious, modeling occupancy.
Zero inflation is a special case of mixture models. In this case we have a process that produces 0’s for a discrete random variable with probability \(p\). The process produces a distribution values for the random variable (including 0) \([y|\mathbf\theta]\) with probability \(1-p\). Zero inflation is a common problem in ecology and social science.
Data simulation is a terrific way to check your model code because you know the parameter values that gave rise to the simulated data. This knowledge allows to compare the mean of the marginal posterior distributions approximated by your model against the values used to generate the data used to fit the model. Data simulation also forces you to think carefully about your model. Consider the following model for count data:
\[\begin{align*} y_{i}\sim\begin{cases} \text{Poisson}(\lambda)\, & \mbox{if }z_{i}=0\\ 0 & \mbox{if }z_{i}=1 \end{cases}\\ z_{i}\sim\text{Bernoulli}(p), \\p\,\text{is the probability of a zero} \end{align*}\]The full posterior and joint distributions can be written as
\[[\lambda,p,\mathbf{z}\mid \mathbf{y}]\propto\prod_{i=1}^n(y_i|\text{Poisson}(\lambda(1-z_i))\text{Bernoulli}(z_i|p)\text{gamma}(\lambda|.001,.001)\text{uniform}(p|0,1).\]
Simulate 500 data points from the joint distribution assuming \(\lambda\)=5.3 and \(p\)=.1. Fit the model in JAGS to see if you can recover the parameters. They will not match exactly. Why not? Conduct posterior predictive checks.
The fit parameters and the generating parameters will be close but not identical. This is because the data are a single realization of a stochastic process. The means of replicated data simulations and associated model fits will be identical given sufficient replications.
library(arm) # for discrete histogram
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
##
## arm (Version 1.9-3, built: 2016-11-21)
## Working directory is /Users/Tom/Documents/Teaching/BayesWorkshop/Labs/Swiss breeding birds occupancy model
library(rjags)
## Loading required package: coda
##
## Attaching package: 'coda'
## The following object is masked from 'package:arm':
##
## traceplot
## Linked to JAGS 4.2.0
## Loaded modules: basemod,bugs
p = .1
lambda = 5.3
n=500
z = rbinom(100,prob=p,size=1)
y = (1-z)*rpois(n,lambda)
discrete.histogram(y,xlab="y", main="Simulated data")
{ # Extra bracket needed only for R markdown files
sink("ZIP.R")
cat("
model{
lambda ~ dgamma(.001,.001)
p ~ dunif (0,1)
for(i in 1:length(y)){
z[i] ~ dbern(p)
y[i] ~ dpois(lambda*(1-z[i]))
y.sim[i] ~ dpois(lambda*(1-z[i]))
} #end of i
mean.y = mean(y)
mean.y.sim = mean(y.sim)
sd.y = sd(y)
sd.y.sim = sd(y.sim)
p.mean = step(mean.y.sim-mean.y)
p.sd = step(sd.y.sim-sd.y)
} #end of model
",fill=TRUE)
sink()
}
data = list(y=y)
jm = jags.model("ZIP.R", data=data, n.chains=3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 500
## Unobserved stochastic nodes: 1002
## Total graph size: 5516
##
## Initializing model
update(jm, n.iter=1000)
zc = coda.samples(jm,variable.names=c("p","lambda", "p.mean", "p.sd"), n.iter=1000)
gelman.diag(zc)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## lambda 1 1
## p 1 1
## p.mean 1 1
## p.sd 1 1
##
## Multivariate psrf
##
## 1
summary(zc)
##
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## lambda 5.1137 0.1152 0.0021037 0.0022277
## p 0.1923 0.0177 0.0003231 0.0003293
## p.mean 0.4970 0.5001 0.0091301 0.0092900
## p.sd 0.6240 0.4845 0.0088450 0.0089791
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## lambda 4.8967 5.0365 5.1110 5.192 5.3465
## p 0.1594 0.1802 0.1919 0.204 0.2285
## p.mean 0.0000 0.0000 0.0000 1.000 1.0000
## p.sd 0.0000 0.0000 1.0000 1.000 1.0000
You mixed a scalar 0 with a distribution \(\text{Poisson}(y_i\mid \lambda)\) in the example above. Here, you will learn how to mix two distributions. You saw in lecture how to mix two distributions to represent differences in beak size between sexes of Darwin’s finches. (How else might you attack this problem?). The math for this problem is simple, but there are a couple of tricks for implementing it in JAGS, shown below.
The concentration of metabolizable energy (ME) in plants (k joule / g dry matter) and the biomass of plants ultimately determines how many herbivores can exist in a habitat. Herbaceous plants contain higher concentrations of ME than woody plants because their cell walls contain less lignin. This creates a biomodal distribution of ME in plant communities that include woody and herbaceous plants as illustrated in the histogram below, which depicts the probability density of random samples of plant tissue collected from mountain shrub communities in Colorado.
y=read.csv("ME_data.csv")
hist(y$ME,breaks = 20, xlab = "Metabolizable energy (kjoule / g)", main="", freq=FALSE)
A model of these data that used a unimodal distribution (What would be good choices based on the support of ME?) would fail posterior predictive checks because the model would not be capable of giving rise to the data. In this problem, you will mix two means and two standard deviations using a parameter \(p\) that controls the relative weight of these parameters in the final, bimodal distribution. In this problem you will create JAGS code that returns the posterior distribution of the means and standard deviations for each plant type and the mixing parameter.
Here are some hints to get started. Choose sensible priors for two distributions parameterized by their moments their moments. Here is some starter JAGS code:
sigma[1] ~ dgamma(4/100, 2/100)
sigma[2] ~ dgamma(25/200,5/200) T(sigma[1],)
mu[1] ~ dgamma(4/100,2/100)
mu[2] ~ dgamma(49/500, 7/500) T(mu[1],)
p ~ dunif(0,1) #mixing parameter
Note a couple of things. First, the priors are centered on reasonable values for the means and the standard deviations based on well established knowledge of the composition of plant tissue, but they have large variances, allowing them to be only weakly informative. The second, critical thing to note is the code T(sigma[1],) and T(mu[1],). The T() assures that the draws for mu[2] are always greater than the draws for mu[1] at any given iteration of the MCMC algorithm and similarly, that draws for sigma[2] always exceed those for sigma[1]. The code will not converge if the T()’s are not present. It is probably a good idea to specify initial values for mu[2] that are greater than values for mu[1] and initial values for sigma[2] that are greater than values for sigma[1], although I was able to obtain good results without doing so.
You will now need to choose a likelihood appropriate for the support of the random variable and do the proper moment matching of the moments to the parameters of your chosen distribution. What is the support of the random variable ME? Next you will make draws of a latent random variable, say \(z_1\) and \(z_2\) from each of those distributions and will mix them using the parameter ’p The probability of the data conditional on the parameters (i.e, the likelihood) will be composed as
Write the full posterior and joint distributions. Write JAGS code to find marginal posterior distributions of the unobserved quantities. Conduct posterior predictive checks. Overlay a density plot of simulated data on a histogram of real data. What is the probability that a sample of plant tissue drawn from this community will have an ME concentration greater than 10 k joule / g?
library(rjags)
{ # Extra bracket needed only for R markdown files
sink("fit_distJAGS.R")
cat("
model{
sigma[1] ~ dgamma(4/100, 2/100)
sigma[2] ~ dgamma(25/200,5/200) T(sigma[1],)
mu[1] ~ dgamma(4/100,2/100)
mu[2] ~ dgamma(49/500, 7/500) T(mu[1],)
p ~ dunif(0,1)
alpha[1] <- mu[1]^2 / sigma[1]^2
beta[1] <- mu[1] / sigma[1]^2
alpha[2] <- mu[2]^2 / sigma[2]^2
beta[2] <- mu[2] / sigma[2]^2
for(i in 1:length(y)){
z[i] ~ dbern(p)
alpha.mix[i] <- z[i] * alpha[1] + (1-z[i]) * alpha[2]
beta.mix[i] <- z[i] * beta[1] + (1-z[i]) * beta[2]
y[i] ~ dgamma(alpha.mix[i],beta.mix[i])
y.new[i] ~ dgamma(alpha.mix[i],beta.mix[i])
} # end of i
sd.y = sd(y[])
sd.new = sd(y.new[])
p.sd = step(sd.new-sd.y)
mean.y = mean(y[])
mean.new = mean(y.new[])
p.mean = step(mean.new - mean.y)
} #end of model
",fill=TRUE)
sink()
} # Extra bracket needed only for R markdown files
data = list(y=y$ME)
inits = list(
list(mu=c(2,5), sigma=c(1,3)),
list(mu=c(1,3), sigma = c(2,6))
)
jm=jags.model("fit_distJAGS.R", data=data, inits=inits, n.chains = length(inits), n.adapt = 3000)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 531
## Unobserved stochastic nodes: 1067
## Total graph size: 9628
##
## Initializing model
update(jm, n.iter = 5000)
zc=coda.samples(jm, variable.names=c("mu", "sigma", "p"), n.iter=5000)
zj = jags.samples(jm, variable.names=c("y.new", "p.sd", "p.mean"), n.iter=5000)
hist(y$ME, freq=FALSE, main = "", xlab="Metabolizable energy (kjoule / g)", ylab = "Probability density", ylim=c(0,.2), breaks = 20)
lines(density(zj$y.new), lwd=2)
1-ecdf(zj$y.new)(10)
## [1] 0.140136
You could also attack this problem by using an indicator variable for sex, something like
\[\begin{align*} \mu &= \beta_0 + \beta_1x_i\\ y_i &\sim \text{lognormal}(\log(\mu), \sigma^2) \end{align*}\]where \(x_i\) is 0 if the bird is female and 1 if the bird is male. \(B_1\) is the increase in beak size resulting from being male.
Modeling presence or absence is a classic problem involving mixture models, specifically random variables that are zero-inflated. Extra zeros are encountered when we model presence or absence because zeros arise from two conditions: truly absent individuals and individuals present but undetected. This means we need a model for the process that controls occupancy, the true state, and model of the data that accounts for detection. This is often our starting point in Bayesian analysis – there is a true, unobserved state we seek to understand using a model of a process. We take imperfect observations on that state and must correct them using a model of the data.
A fundamental question in landscape ecology seeks to understand how landscape structure shapes variation in the abundance of species. We will use data from the Swiss Survey of Common Breeding Birds, courtesy of Royle and Dorazio (2008), to model habitat occupancy by a common, resident bird in the Swiss Alps, the willow tit (Parus montanus). The data come from annual surveys of one km2 quadrats distributed across Switzerland (Fig. 1). Surveys are conducted during the breeding season on three separate days, but some quadrats have missing data so that the number of replicate observations is fewer than three.
During each survey, an observer records every visual or acoustic detection of a breeding species (we do not differentiate between these two types of detection in this problem) and marks its location using a global positioning system or, in earlier years, a paper map. We assume that the true state (occupied or unoccupied) does not change among sample dates, an assumption known as closure. This assumption is reasonable because we are observing a resident species during the breeding season. We need to use the data to find the probability that a site is occupied a function of covariates and the probability that a bird is detected at a site conditional on the site being occupied \(p\).
Fig. 1. The willow tit (left, c/o of Francis C. Franklin) is one of 70 bird species that are surveyed annually for abundance in 267 1 km2 sampling units distributed across Switzerland (right, c/o the Swiss Ornithological Institute).
We want to understand the influence of forest cover and elevation on the distribution of the willow tit. The data frame SwissBirds has the number of times a quadrat (quadrat) was searched (numberVisits) and the number of times willow tits were detected (numberDetections). We have covariates on forest canopy cover (forestCover) as well as elevation in meters (elevation) for each quadrat surveyed. Develop a model of the influence of forest cover and elevation on the distribution of willow tits. Your model should allow you to find the posterior distribution of the optimum elevation of willow tit habitat at the mean forest cover, where optimum elevation is defined as the elevation where probability of occupancy is maximum.
scale function to scale these covariates (this will drastically speed convergence).{ # Extra bracket needed only for R markdown files
sink("SwissBirds.R")
cat("
model{
# priors
p ~ dbeta(1, 1)
for (i in 1:4){
beta[i] ~ dnorm(0, .368)
}
# likelihood
for (i in 1:N){
z[i] ~ dbern(psi[i])
logit(psi[i]) <- beta[1] + beta[2]*elevation[i] + beta[3]*elevation[i]^2 + beta[4]*forestCover[i]
y[i] ~ dbin(z[i] * p, n[i])
}
# derived quantities
elevationMaxScaled <- -beta[2]/(2 * beta[3])
elevationMax <- elevationMaxScaled * sdElevation + muElevation
for (j in 1:length(elevationPredict)){
#note that the x's are standardized on the R side
logit(psiPredict[j]) <- beta[1] + beta[2]*elevationPredict[j] + beta[3]*elevationPredict[j]^2
}
}
", fill = TRUE)
sink()
} # Extra bracket needed only for R markdown files
library(SESYNCBayes)
library(rjags)
set.seed(1)
elevation <- scale(SwissBirds$elevation)
muElevation <- mean(SwissBirds$elevation)
sdElevation <- sd(SwissBirds$elevation)
elevation_Predict = seq(500,2500,10) # for plotting probability of occupancy vs unstandardized elevation
elevationPredict_stand = (elevation_Predict - muElevation)/sdElevation #Standardized for making predictions of probability of occupancy at new elevations
forestCover <- scale(SwissBirds$forestCover)
muforestCover <- mean(SwissBirds$forestCover)
sdforestCover <- sd(SwissBirds$forestCover)
inits = list(
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)))
data = list(
N = nrow(SwissBirds),
elevation = as.double(elevation),
forestCover = as.double(forestCover),
n = as.double(SwissBirds$numberVisits),
y = as.double(SwissBirds$numberDetections),
muElevation = as.double(muElevation),
sdElevation = as.double(sdElevation),
elevationPredict = as.double(elevationPredict_stand)) #note standardized for predictions
n.adapt = 5000
n.update = 10000
n.iter = 10000
jm = jags.model("SwissBirds.R", data = data, inits = inits, n.chains = length(inits), n.adapt = n.adapt)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 237
## Unobserved stochastic nodes: 242
## Total graph size: 4128
##
## Initializing model
update(jm, n.iter = n.update)
zm = coda.samples(jm, variable.names = c("p", "beta"), n.iter = n.iter, n.thin = 1)
zj = jags.samples(jm, variable.names = c("elevationMax", "psiPredict"), n.iter = n.iter, n.thin = 1)
summary(zm)
##
## Iterations = 15001:25000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] -0.2109 0.2709 0.0015642 0.0040810
## beta[2] 1.9660 0.3016 0.0017413 0.0044497
## beta[3] -1.0851 0.2642 0.0015255 0.0047449
## beta[4] 0.8520 0.2278 0.0013153 0.0027371
## p 0.7869 0.0292 0.0001686 0.0002431
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.7409 -0.3950 -0.2103 -0.02627 0.3146
## beta[2] 1.4186 1.7578 1.9514 2.15813 2.6008
## beta[3] -1.6205 -1.2596 -1.0776 -0.90183 -0.5864
## beta[4] 0.4234 0.6943 0.8453 1.00228 1.3121
## p 0.7265 0.7680 0.7881 0.80695 0.8411
plot(zm)
gelman.diag(zm)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## beta[1] 1 1
## beta[2] 1 1
## beta[3] 1 1
## beta[4] 1 1
## p 1 1
##
## Multivariate psrf
##
## 1
heidel.diag(zm)
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.2305
## beta[2] passed 1 0.0733
## beta[3] passed 1 0.2100
## beta[4] passed 1 0.8835
## p passed 2001 0.1356
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.206 0.014014
## beta[2] passed 1.957 0.015627
## beta[3] passed -1.082 0.016508
## beta[4] passed 0.848 0.009371
## p passed 0.785 0.000951
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.369
## beta[2] passed 1 0.390
## beta[3] passed 1 0.272
## beta[4] passed 1 0.453
## p passed 1 0.157
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.213 0.013252
## beta[2] passed 1.971 0.015292
## beta[3] passed -1.087 0.015634
## beta[4] passed 0.853 0.008740
## p passed 0.788 0.000823
##
## [[3]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.534
## beta[2] passed 1 0.381
## beta[3] passed 1 0.989
## beta[4] passed 1 0.186
## p passed 1 0.412
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.214 0.014277
## beta[2] passed 1.970 0.014370
## beta[3] passed -1.086 0.016170
## beta[4] passed 0.855 0.009738
## p passed 0.787 0.000811
We are entitled to conclude that elevation is roughly twice as important as forest cover because its coefficient the elevation coefficient is twice as large and because the data have been standardized.
You need to formulate a deterministic model that has a maximum and that is relatively easy to differentiate, something like \(\mu_i = \beta_0+ \beta_1x_{1,i}+\beta_2x_{2,i}^2+\beta_3x_{3,1}\) where \(\mathbf{x}_2\) are the data on elevation and \(\mathbf{x}_3\) are the data on forest cover, when = 0 at the mean. Dropping the i subscript for more general notation, we now have \[\mu=\beta_0+ \beta_1x_1+\beta_2x_2^2.\]
To find the maximum, we take the first derivative \(\frac {d\mu}{dx}=\beta_1+ 2\beta_2 x_{1}\), set it to zero \(0=\beta_2+ 2\beta x_1\), and solve for \(x_1\), \(x_1=\frac{-\beta_1}{2\beta_2}\). You include this result as a derived quantity in your JAGS code.
par(mfrow = c(1, 2))
psi <- summary(zj$psiPredict, quantile, c(.025, .5, .975))$stat
plot(elevation_Predict, psi[2,], type = "l", ylim = c(0,1), xlab = "Elevation (m)", lwd = 2,
ylab ="Probability of occupancy")
lines(elevation_Predict, psi[1,], type = "l", lty = "dashed", lwd = 2)
lines(elevation_Predict, psi[3,], type = "l", lty = "dashed", lwd = 2)
abline(v = mean(zj$elevationMax), lwd = 2)
text(1300, .9, "Optimum elevation", cex = .8)
hist(zj$elevationMax, breaks = 500, main = "", xlab = "Elevation (m)", xlim = c(1500, 2500),
freq = FALSE, col = "azure1")
abline(v = summary(zj$elevationMax, quantile, c(.975))$stat, lty = "dashed", lwd = 2)
abline(v = summary(zj$elevationMax, quantile, c(.025))$stat, lty = "dashed", lwd = 2)
Fig.2. Probability of occupancy at the mean forest cover as a function of elevation (left panel) and the posterior density of the optimum elevation at the mean forest cover (right panel). Dashed give are .025 and .975 quantiles, also known as .95 equal-tailed Bayesian credible intervals.
Royle, J.A., and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations, and Communities. Academic Press, London, United Kingdom.